The margin of error is a statistic expressing the amount of random

sampling error In statistics, sampling errors are incurred when the statistical characteristics of a population are estimated from a subset, or sample, of that population. Since the sample does not include all members of the population, statistics of the sample ( ...

in the results of a

survey Survey may refer to: Statistics and human research * Statistical survey, a method for collecting quantitative information about items in a population * Survey (human research), including opinion polls Spatial measurement * Surveying, the techniq ...

. The larger the margin of error, the less confidence one should have that a poll result would reflect the result of a census of the entire

population Population typically refers to the number of people in a single area, whether it be a city or town, region, country, continent, or the world. Governments typically quantify the size of the resident population within their jurisdiction using a ...

. The margin of error will be positive whenever a population is incompletely sampled and the outcome measure has positive

variance In probability theory and statistics, variance is the expectation of the squared deviation of a random variable from its population mean or sample mean. Variance is a measure of dispersion, meaning it is a measure of how far a set of numbers ...

, which is to say, the measure ''varies''. The term ''margin of error'' is often used in non-survey contexts to indicate

observational error Observational error (or measurement error) is the difference between a measured value of a quantity and its true value.Dodge, Y. (2003) ''The Oxford Dictionary of Statistical Terms'', OUP. In statistics, an error is not necessarily a " mistake ...

in reporting measured quantities.

Concept

Consider a simple ''yes/no'' poll

P

as a sample of

n

respondents drawn from a population

N \text(n \ll N)

reporting the percentage

p

of ''yes'' responses. We would like to know how close

p

is to the true result of a survey of the entire population

N

, without having to conduct one. If, hypothetically, we were to conduct poll

P

over subsequent samples of

n

respondents (newly drawn from

N

), we would expect those subsequent results

p_1,p_2,\ldots

to be normally distributed about

\overline

. The ''margin of error'' describes the distance within which a specified percentage of these results is expected to vary from

\overline

. According to the 68-95-99.7 rule, we would expect that 95% of the results

p_1,p_2,\ldots

will fall within ''about'' two

standard deviation In statistics, the standard deviation is a measure of the amount of variation or dispersion of a set of values. A low standard deviation indicates that the values tend to be close to the mean (also called the expected value) of the set, while ...

s (

\plusmn2\sigma_

) either side of the true mean

\overline

. This interval is called the

confidence interval In frequentist statistics, a confidence interval (CI) is a range of estimates for an unknown parameter. A confidence interval is computed at a designated ''confidence level''; the 95% confidence level is most common, but other levels, such as 9 ...

, and the ''radius'' (half the interval) is called the ''margin of error'', corresponding to a 95% ''confidence level''. Generally, at a confidence level

\gamma

, a sample sized

n

of a population having expected standard deviation

\sigma

has a margin of error :

MOE_\gamma = z_\gamma \times \sqrt

where

z_\gamma

denotes the ''quantile'' (also, commonly, a ''

z-score In statistics, the standard score is the number of standard deviations by which the value of a raw score (i.e., an observed value or data point) is above or below the mean value of what is being observed or measured. Raw scores above the mean ...

''), and

\sqrt

is the

standard error The standard error (SE) of a statistic (usually an estimate of a parameter) is the standard deviation of its sampling distribution or an estimate of that standard deviation. If the statistic is the sample mean, it is called the standard error ...

Standard deviation and standard error

We would expect the normally distributed values

p_1,p_2,\ldots

to have a standard deviation which somehow varies with

n

. The smaller

n

, the wider the margin. This is called the standard error

\sigma_\overline

. For the single result from our survey, we ''assume'' that

p = \overline

, and that ''all'' subsequent results

p_1,p_2,\ldots

together would have a variance

\sigma_^2=P(1-P)

. :

\text = \sigma_\overline \approx \sqrt \approx \sqrt

Note that

p(1-p)

corresponds to the variance of a

Bernoulli distribution In probability theory and statistics, the Bernoulli distribution, named after Swiss mathematician Jacob Bernoulli,James Victor Uspensky: ''Introduction to Mathematical Probability'', McGraw-Hill, New York 1937, page 45 is the discrete probabil ...

Maximum margin of error at different confidence levels

For a confidence ''level''

\gamma

, there is a corresponding confidence ''interval'' about the mean

\mu\plusmn z_\gamma\sigma

, that is, the interval

mu-z_\gamma\sigma,\mu+z_\gamma\sigma /math> within which values of P should fall with probability \gamma . Precise values of z_\gamma are given by the quantile function of the normal distribution (which the 68-95-99.7 rule approximates).

Note that z_\gamma is undefined for, \gamma,  \ge 1, that is, z_is undefined, as is z_.

Margin of error vs sample size and confidence level

Since

\max \sigma_P^2 = \max P(1-P) = 0.25

p = 0.5

, we can arbitrarily set

p=\overline = 0.5

, calculate

\sigma_

\sigma_\overline

, and

z_\gamma\sigma_\overline

to obtain the ''maximum'' margin of error for

P

at a given confidence level

\gamma

and sample size

n

, even before having actual results. With

p=0.5,n=1013

MOE_(0.5) = z_\sigma_\overline \approx z_\sqrt = 1.96\sqrt = 0.98/\sqrt=\plusmn3.1%

MOE_(0.5) = z_\sigma_\overline \approx z_\sqrt = 2.58\sqrt = 1.29/\sqrt=\plusmn4.1%

Also, usefully, for any reported

MOE_

MOE_ = \fracMOE_ \approx 1.3 \times MOE_

Specific margins of error

If a poll has multiple percentage results (for example, a poll measuring a single multiple-choice preference), the result closest to 50% will have the highest margin of error. Typically, it is this number that is reported as the margin of error for the entire poll. Imagine poll

P

reports

p_,p_,p_

71%, 27%, 2%, n=1013

MOE_(P_) = z_\sigma_\overline \approx 1.96\sqrt = 0.89/\sqrt=\plusmn2.8%

(as in the figure above) :

MOE_(P_) = z_\sigma_\overline \approx 1.96\sqrt = 0.87/\sqrt=\plusmn2.7%

MOE_(P_) = z_\sigma_\overline \approx 1.96\sqrt = 0.27/\sqrt=\plusmn0.8%

As a given percentage approaches the extremes of 0% or 100%, its margin of error approaches ±0%.

Comparing percentages

Imagine multiple-choice poll

P

reports

p_,p_,p_

46%, 42%, 12%, n=1013

. As described above, the margin of error reported for the poll would typically be

MOE_(P_)

, as

p_

is closest to 50%. The popular notion of ''statistical tie'' or ''statistical dead heat,'' however, concerns itself not with the accuracy of the individual results, but with that of the ''ranking'' of the results. Which is in first? If, hypothetically, we were to conduct poll

P

over subsequent samples of

n

respondents (newly drawn from

N

), and report result

p_ = p_ - p_

, we could use the ''standard error of difference'' to understand how

p_,p_,p_,\ldots

is expected to fall about

\overline

. For this, we need to apply the ''sum of variances'' to obtain a new variance,

\sigma_^2

, :

\sigma_^2=\sigma_^2 = \sigma_^2 + \sigma_^2-2\sigma_ = p_(1-p_) + p_(1-p_) + 2p_p_

where

\sigma_ = -P_P_

is the

covariance In probability theory and statistics, covariance is a measure of the joint variability of two random variables. If the greater values of one variable mainly correspond with the greater values of the other variable, and the same holds for the les ...

P_

and

P_

. Thus (after simplifying), :

\text = \sigma_ \approx \sqrt = \sqrt = 0.029, P_=P_-P_

MOE_(P_) = z_\sigma_ \approx \plusmn

MOE_(P_) = z_\sigma_ \approx \plusmn

Note that this assumes that

P_

is close to constant, that is, respondents choosing either A or B would almost never chose C (making

P_

and

P_

close to ''perfectly negatively correlated''). With three or more choices in closer contention, choosing a correct formula for

\sigma_^2

becomes more complicated.

Effect of finite population size

The formulae above for the margin of error assume that there is an infinitely large population and thus do not depend on the size of population

N

, but only on the sample size

n

. According to sampling theory, this assumption is reasonable when the

sampling fraction In sampling theory, the sampling fraction is the ratio of sample size to population size or, in the context of stratified sampling, the ratio of the sample size to the size of the stratum. The formula for the sampling fraction is :f=\frac, where ...

is small. The margin of error for a particular sampling method is essentially the same regardless of whether the population of interest is the size of a school, city, state, or country, as long as the sampling ''fraction'' is small. In cases where the sampling fraction is larger (in practice, greater than 5%), analysts might adjust the margin of error using a

finite population correction The standard error (SE) of a statistic (usually an estimate of a parameter) is the standard deviation of its sampling distribution or an estimate of that standard deviation. If the statistic is the sample mean, it is called the standard error o ...

to account for the added precision gained by sampling a much larger percentage of the population. FPC can be calculated using the formula (Equation 1) :

\operatorname = \sqrt

...and so, if poll

P

were conducted over 24% of, say, an electorate of 300,000 voters, :

MOE_(0.5) = z_\sigma_\overline \approx \frac=\plusmn0.4%

MOE_(0.5) = z_\sigma_\overline\sqrt\approx \frac\sqrt=\plusmn0.3%

Intuitively, for appropriately large

N

, :

\lim_ \sqrt\approx 1

\lim_ \sqrt = 0

In the former case,

n

is so small as to require no correction. In the latter case, the poll effectively becomes a census and sampling error becomes moot.

References

Sources

* Sudman, Seymour and Bradburn, Norman (1982). ''Asking Questions: A Practical Guide to Questionnaire Design''. San Francisco: Jossey Bass. *

External links

* * {{mathworld , urlname = MarginofError , title = Margin of Error Error Measurement Sampling (statistics) Statistical deviation and dispersion Statistical intervals